Session 1
“(…) the world is our lab, and the many diverse people in it are our subjects” (Angrist, 2015)
Uncover the causal effect of an intervention
We use data to answer these “cause-and-effect” questions!
We use data to answer these “cause-and-effect” questions!
We use data to answer these “cause-and-effect” questions!
Correlation \(\neq\) Causation
Conceptual level
Operational level
The t-test measures how far the sample mean is away from \(H_0\)
Hypothesis:
Research question: Does beer affect performance?
becks_data %>%
group_by(treatment) %>%
summarise(across("performance" , ~ mean(.x))) %>%
mutate(beer = case_when(treatment == 1 ~ "Beer",
treatment == 0 ~ "No beer"),
# Cosmetic change to reorder so "Beer" is on the left
beer = factor(beer, levels = c("Beer", "No beer"))) %>%
ggplot(aes(x = beer, y = performance, group = beer, fill = beer)) +
geom_col() +
scale_y_continuous(breaks = seq(0,10,1)) +
scale_fill_manual(name = "", values = c("#F7AB64", "#93B8D6")) +
labs(x="", y="Grade") +
ggthemes::theme_hc(base_size = 18) +
theme(legend.position="") +
ylim(0,10)
Variable | Beer | No beer | Difference | t-stat. | p-value |
---|---|---|---|---|---|
Performance | 8.44 | 7.00 | 1.43 | -10.09 | 0.00 |
Education | 3.73 | 2.92 | 0.82 | -6.54 | 0.00 |
Gender | 0.40 | 0.40 | 0.00 | -0.11 | 0.91 |
# Plot results for simple group comparison - condition on observed factors
becks_data %>%
mutate(high_edu = case_when(education > mean(becks_data$education) ~ 1,
education <= mean(becks_data$education) ~ 0) %>% as.character()) %>%
group_by(treatment, high_edu) %>%
summarise(across("performance" , ~ mean(.x))) %>%
mutate(beer = case_when(treatment == 1 ~ "Beer",
treatment == 0 ~ "No beer"),
# Cosmetic change to reorder so "Beer" is on the left
beer = factor(beer, levels = c("Beer", "No beer"))) %>%
ggplot(aes(x = beer, y = performance, group = high_edu, color = high_edu)) +
geom_point(size = 2) +
geom_line(aes(group = high_edu), linetype = "dashed", size = 1.5 ) +
scale_y_continuous(breaks = seq(0,10,1)) +
scale_color_manual(values = c("0" = "#93B8D6", "1" = "#F7AB64"), name = "High education") +
labs(x="", y="Grade") +
ggthemes::theme_hc(base_size = 18) +
theme(legend.position="bottom")+
ylim(4,10)
becks_data %>%
group_by(treatment, gender) %>%
mutate(gender = as.character(gender)) %>%
summarise(across("performance" , ~ mean(.x))) %>%
mutate(beer = case_when(treatment == 1 ~ "Beer",
treatment == 0 ~ "No beer"),
# Cosmetic change to reorder so "Beer" is on the left
beer = factor(beer, levels = c("Beer", "No beer"))) %>%
ggplot(aes(x = beer, y = performance, group = gender, color = gender)) +
geom_point(size = 2) +
geom_line(aes(group = gender), linetype = "dashed", size = 1.5) +
scale_y_continuous(breaks = seq(0,10,1)) +
scale_color_manual(values = c("0" = "#93B8D6", "1" = "#F7AB64"), name = "Gender") +
labs(x="", y="Grade") +
ggthemes::theme_hc(base_size = 18) +
theme(legend.position="bottom")+
ylim(4,10)
Variable | Beer | No beer | Difference | t-stat. | p-value |
---|---|---|---|---|---|
Performance | 8.44 | 7.00 | 1.43 | -10.09 | 0.00 |
Talent | 5.79 | 3.28 | 2.51 | -24.76 | 0.00 |
Education | 3.73 | 2.92 | 0.82 | -6.54 | 0.00 |
Gender | 0.40 | 0.40 | 0.00 | -0.11 | 0.91 |
library(ggcorrplot)
# Calculate correlations
# I make a copy of the dataset for nicer names in the correlation plot
becks_data_corr <- becks_data
names(becks_data_corr) <- names(becks_data) %>% str_to_sentence()
# Calculate correlations
becks_data_corr <- becks_data
names(becks_data_corr) <- names(becks_data) %>% str_to_sentence()
corr <-
becks_data_corr %>%
select(- Student , -Treatment_random) %>%
cor(.) %>%
round(.,2)
ggcorrplot(corr, hc.order = TRUE,type = "upper", lab = TRUE) +
xlab("") + ylab("") +
ggthemes::theme_hc(base_size = 16) +
theme(legend.position="bottom")
rm(becks_data_corr)
Variable | Beer | No beer | Difference | t-stat. | p-value |
---|---|---|---|---|---|
Performance | 8.44 | 7.00 | 1.43 | 0.12 | 0.90 |
Education | 3.73 | 2.92 | 0.82 | -0.23 | 0.82 |
Gender | 0.40 | 0.40 | 0.00 | 0.40 | 0.69 |
Talent | 5.79 | 3.28 | 2.51 | 0.63 | 0.53 |
becks_data %>%
group_by(treatment_random) %>%
summarise(across("performance" , ~ mean(.x))) %>%
mutate(beer = case_when(treatment_random == 1 ~ "Beer",
treatment_random == 0 ~ "No beer"),
# Cosmetic change to reorder so "Beer" is on the left
beer = factor(beer, levels = c("Beer", "No beer"))) %>%
ggplot(aes(x = beer, y = performance, group = beer, fill = beer)) +
geom_col() +
scale_y_continuous(breaks = seq(0,10,1)) +
scale_fill_manual(name = "", values = c("#F7AB64", "#93B8D6")) +
labs(x="", y="Grade") +
ggthemes::theme_hc(base_size = 18) +
theme(legend.position="bottom") +
ylim(0,10)
Controlling variation in the causal variable, e.g. beer drinking
Makes sure that the treatment and control group are similar along observable and unobservable dimensions
The only difference between the two groups is the treatment
This allows us to attribute any difference in outcomes to the treatment
No selection bias (endogeneity issue)
Types of experiments:
(1) After | (2) Before | (1) - (2) | |
---|---|---|---|
(a) Treatment | Y\(_{treated,\ after}\) | Y\(_{treated,\ before}\) | \(\Delta_{treated}\) |
(b) Control | Y\(_{control,\ after}\) | Y\(_{control,\ before}\) | \(\Delta_{control}\) |
(a) - (b) | \(\Delta_{after}\) | \(\Delta_{before}\) | DiD |
\[Y = \beta_0 + \beta_1 Treated + \beta_2 After + \beta_3 Treated \times After + \epsilon\]
The difference-in-differences regression gives you the same estimate as if you took differences in the group averages
It takes also care of any unobserved constant differences between subjects and time trends!
\[Y = \beta_0 + \beta_1 Treated + \beta_2 After + \beta_3 Treated \times After + \epsilon\]
(1) After | (2) Before | (1) - (2) | |
---|---|---|---|
(a) Treatment | \(\beta_0 + \beta_1+\beta_2+\beta_3\) | \(\beta_0 + \beta_1\) | \(\beta_2+\beta_3\) |
(b) Control | \(\beta_0 + \beta_2\) | \(\beta_0\) | \(\beta_2\) |
(a) - (b) | \(\beta_1+\beta_3\) | \(\beta_1\) | \(\beta_3\) |
“In order to estimate the effect of legal cannabis access on student performance, we exploit a unique natural experiment that temporarily discriminated legal access to cannabis based on nationality. We apply a difference-in-differences approach across time and nationality groups.”
Academic performance of students who are no longer legally permitted to buy cannabis increases
Grade improvements are driven by younger students
Effects are stronger for women and low performers
Performance gains are larger for courses that require more numerical/mathematical skills
Performance gains are driven by an improved understanding not than changes in students’ study effort
# Eye-test
did_data %>%
group_by(treatment, year) %>%
summarise(performance = mean(performance, rm.na = TRUE)) %>%
mutate(type = case_when(treatment == 1 ~ "Treated group",
TRUE ~ "Control group") ) %>%
ggplot(aes(x = year, y = performance, color = type) ) +
geom_point(size = 2) +
geom_line(size = 1) +
scale_color_manual(values = c( "#93B8D6", "#F7AB64"), name = "" ) +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey") +
scale_y_continuous(breaks = seq(4, 8, .5),limits = c(5, 8) ) +
ggthemes::theme_hc(base_size = 18) +
labs( x = "Time", y ="Grade") +
theme(legend.position="bottom")
After | Before | Difference | |
---|---|---|---|
Treated group | 7.11 | 6.02 | 1.09 |
Control group | 7.49 | 7.51 | -0.02 |
Difference | -0.37 | -1.49 | 1.11 |
# Descriptive statistics
table_did <-
did_data %>%
group_by(treatment, after) %>%
summarise(performance = mean(performance, rm.na = TRUE)) %>%
spread(after, performance) %>%
mutate(Difference = `1` - `0`) %>%
rename(" " = treatment,
"After" = `1`,
"Before" = `0`) %>%
select(" ", "After", "Before", Difference) %>%
mutate(" " = case_when(` ` == 1 ~ "Treated group",
TRUE ~ "Control group") ) %>%
arrange(Before)
table_did <- table_did %>% bind_rows(
# Add difference column
tibble(
" " = "Difference",
After = table_did$After[1] - table_did$After[2],
Before = table_did$Before[1] - table_did$Before[2],
Difference = After - Before))
table_did %>%
kable(digits = 2) %>%
kable_styling(., "striped", position = "left", font_size = 35)
\(\text{Treatment} \times \text{After}\)
Crucial assumption: Parallel trends between treatment and control in the pre-period!
# Set Table style
# Dictionary => set only once per session
dict <- setFixest_dict( c(performance = "Grade",
treatment = "Treatment",
after = "After",
student = "Student ID",
year = "Year"
))
# The style of the table
my_style = style.tex(tpt = TRUE,
notes.tpt.intro = "\\footnotesize")
setFixest_etable(style.tex = my_style, markdown = TRUE)
# Run DiD regression
a <- feols(performance ~ treatment * after , data = did_data, vcov = "hetero" )
b <- feols(performance ~ treatment:after | student + year , data = did_data)
# Output/Export table
etable( a,b,
title = "Effect of policy change on student grades",
digits = 3,
tex = TRUE,
fitstat = ~ar2 + n,
replace = T,
style.tex = style.tex("aer"),
highlight = .("rowcol, #F7AB64, se" = "treatment:after"),
coef.just = "l",
placement = "h!",
order = c("!Constant", "^treatment:after$", "treatment", "after"),
headers = list("OLS" = 1, "Fixed effects" = 1),
view = T)
)
After | Before | Difference | |
---|---|---|---|
Treated group | 7.11 | 6.02 | 1.09 |
Control group | 7.49 | 7.51 | -0.02 |
Difference | -0.37 | -1.49 | 1.11 |
feols(performance ~ i(year, treatment, -1) | student + year , data = did_data) %>%
etable(.,
#title = "Effect of policy change on student grades",
digits = 3,
tex = TRUE,
fitstat = ~ar2 + n,
replace = T,
style.tex = style.tex("aer"),
highlight = .("rowcol, #C9DBEA, se" = c("year::-2:treatment"),
"rowcol, #E7E8EE, se" = c("year::1:treatment","year::2:treatment")),
coef.just = "l",
placement = "h!",
view = T
)
feols(performance ~ i(year, treatment, -1) | student + year , data = did_data) %>%
ggiplot(
ref.line = -1,
main = "",
xlab = "Time to treatment",
multi_style = "facet",
geom_style = "ribbon",
col = '#F7AB64',
#facet_args = list(labeller = labeller(id = \(x) gsub(".*: ", "", x))),
theme = ggthemes::theme_hc(base_size = 18) +
theme(
text = element_text(),
plot.title = element_text(hjust = 0.5),
legend.position = "none"
)
)
Good luck with your BSc projects!
References & useful resources
How to review a paper (general advice)
The effect: An introduction to research design and causality
Research Design in the Social Sciences: Declaration, Diagnosis, and Redesign
Accounting Research: An Introductory Course
Data Science for Economists and Other Animals
A Gentle Guide to the Grammar of Graphics with ggplot2 (slideshow)
R Graphics Cookbook, 2nd edition
Check all relevant assumptions for a regression model in one go
Advanced research methods